* ===================<Chapters> ===================
*1. Preparation
*2. Merging datasets
*3. Adjusting Dataset to Target Sample Size
*4. Creating Variables for Data Analysis
* =================================================



* ===================<Target Aimag list> ===================
*(4)21 Dornod
*(3)41 Tuv
*(3)43 Selenge
*(3)46 Umnugovi
*(2)64 Bayankhongor
*(2)65 Arkhangai
*(1)81 Zavkhan
*(1)85 Uvs
* ===========================================================


* ===================<Other notes> ===================
* definition of missing variables
** no ansewers: .
** can't answer : .a
** don't know : .b
* ===========================================================



* =========================
* =========================
* 1. Preparation 
* =========================
* =========================

capture clear matrix, clear all

tempfile admin_data admin_data_s1 admin_data_s2 admin_data_id1  merged1 merged2 beforedrop



* =========================
* =========================
* 2. Merging datasets
* =========================
* =========================


* =========================
* Step 1: Prepare the admin data
* We need to prepare two identical datasets
* to merge them by s1 (their id) and s2 (their spouse id) 
* with our questionnaire dataset  
* We also need data with questionnaire id with dropping their spouse records
* (1 questionnaire id usually has two rows, them and their spouse)
* =========================
				  

import excel rawdata/confidential/admin_data.xlsx, sheet("Sheet1") firstrow clear


save `admin_data', replace
use `admin_data', clear

* drop duplication in ID and save to be merged later. 
rename National_ID s1
gen admin_order=_n
duplicates drop s1, force
save `admin_data_s1', replace

** admin_data_s1 is the admin data in which duplications are removed

rename s1 s2
foreach x of varlist sum_pen-p2017_12{
rename `x' `x'_s2
}
save `admin_data_s2', replace
** admin_data_s2 is same as admin_data_s1 except the variable name, renamed as "..._s2"(for respondent spouse's records)
** This is to be merged as spouse's records

use `admin_data', clear
rename National_ID s1
keep if spouse==0
save `admin_data_id1', replace
* admin_data_id1 is the admin_data_s1 with only spouse==0 (not spouse)
* This is to be merged as respondents' records



* =========================
* Step2: Mergethe admin data into the survey data
* Some observations (276) can be matched by survey id which the pension office supplemantary used for matching
* =========================
			   				   
use rawdata/confidential/survey_data.dta, clear


* Mergewith admin data via "s1" (respondents' national ID)
merge m:1 s1 using `admin_data_s1', force

drop if _merge==2
rename _merge merge_s1
save `merged1', replace
** `merged1' is the merged data of survey and admin records via respondents' national ID.


keep if merge_s1==1
** keep observations which is not merged via s1 (National ID)

drop spouse-p2017_12
** drop pension records


* =========================

* Mergeagain with admin data via "id" (survey ID)
merge m:1 id using `admin_data_id1', force

drop if _merge==2
rename _merge merge_id1
save `merged2', replace
** `merged2' is the merged data of survey and admin records via respondents' survey ID.


use `merged1', clear
drop if merge_s1==1


* append `merged1' (merged via National ID) and `merged2'(merged via National ID).
append using `merged2'


* =========================

* Mergewith admin data via "s2" (respondent spouses' national ID)
merge m:1 s2 using `admin_data_s2', force


drop if _merge==2
rename _merge merge_s2


* =========================
* Step 3: Put bag id to merge the dataset with treatment status
* =========================

* Generate soum and bagh ID variable in the survey data
gen NSO_aimag_code = aimag
gen NSO_soum_code = soum
gen NSO_bag_code = bag_recode

gen s_Soum_ID = NSO_aimag_code*100+soum
gen s_Bag_ID = NSO_aimag_code*10000+NSO_soum_code*100+bag_recode if bag_recode!=0
gen s_Bag_ID_noai = soum*100+bag_recode


* =========================
* * Generate intervention variable 
* * we are not using this because this is answer from participants
* * and may be confused

* destring s6 ,replace force
* replace s6=. if s6>8 &  s6<.
* replace s6=. if s6<5
* replace s6=.  if s6==6.3

* gen intervention = s6 
* label variable intervention "Explanatory material"
* label define intervention 5 "Control" 6 "Disability and survivors pension" 7 "Trust" 8 "Mobile banking"
* label values intervention intervention


bysort s_Bag_ID: generate n1 = _N
*bysort s_Bag_ID intervention: generate n2 = _N
gen num_surveys_in_bag = n1
*gen num_leaflets_in_bag = n2


* =========================

* Merge with the initial intervention plan data
merge m:1 s_Bag_ID using intermediate/target_bags, force
drop if _merge==2

rename horse horse_bagh
rename cattle cattle_bagh
rename camel camel_bagh
rename sheep sheep_bagh
rename goat goat_bagh



* =========================
* =========================
* 3. Adjusting Dataset to Target Sample Size
* =========================
* =========================

* =========================
* Step 1: Generate variables to drop unnessesary observations
* =========================

gen pre_intervention=treatment+5
replace pre_intervention=. if treatment==.

*gen 	match=1 if pre_intervention==intervention
*replace match=0 if pre_intervention!=intervention
*label 	values match yesno

label values pre_intervention intervention


bysort NSO_aimag_code NSO_soum_code : gen soum_oneobs =1 if _n==1
bysort NSO_aimag_code NSO_soum_code NSO_bag_code: gen bagh_oneobs =1 if _n==1 & bag_recode!=0
bysort NSO_aimag_code: egen Obs_Soums = count(soum_oneobs)
bysort NSO_aimag_code: egen Obs_Baghs = count(bagh_oneobs)

label variable soum_oneobs "soum"
label variable bagh_oneobs "bagh"

gen Num_Soums =.
replace Num_Soums =14 if NSO_aimag_code==21
replace Num_Soums =27 if NSO_aimag_code==41
replace Num_Soums =17 if NSO_aimag_code==43
replace Num_Soums =15 if NSO_aimag_code==46
replace Num_Soums =20 if NSO_aimag_code==64
replace Num_Soums =19 if NSO_aimag_code==65
replace Num_Soums =24 if NSO_aimag_code==81
replace Num_Soums =19 if NSO_aimag_code==85

gen Rate_Soums = Obs_Soums / Num_Soums
format %3.2f Rate_Soums


gen Num_Baghs =.
replace Num_Baghs =66  if NSO_aimag_code==21
replace Num_Baghs =97  if NSO_aimag_code==41
replace Num_Baghs =56  if NSO_aimag_code==43
replace Num_Baghs =58  if NSO_aimag_code==46
replace Num_Baghs =104 if NSO_aimag_code==64
replace Num_Baghs =101 if NSO_aimag_code==65
replace Num_Baghs =115 if NSO_aimag_code==81
replace Num_Baghs =93  if NSO_aimag_code==85

gen Rate_Baghs = Obs_Baghs / Num_Baghs
format %3.2f Rate_Baghs


* Constance
gen cons = 1


* Clearing
foreach x of var qs1_1-qs1_2_3 qs1_3_1 qs1_3_2 qs1_4 qs1_5-qs1_8 {
replace `x'=. if `x'==0
}

* Female 
gen female = qs1_1 - 1
label variable female "Female"
label define female 0 "Male" 1 "Female"
label values female female
note female: 0=male 1=female 

* Age at 2017
gen age = 2017- qs1_2_1
label variable age "Age"
label values age age
				   
gen overage =0
replace overage =1 if (age>=60 & age<.) & female==0
replace overage =1 if (age>=55 & age<.) & female==1
*replace overage =0 if (age>=16 & age<=55) & female==.

* Coverage of voluntary social insurance 
gen vol =.
replace vol =0 if qs1_4== 4| qs1_4== 5
replace vol =1 if qs1_4== 1| qs1_4== 2| qs1_4== 3| qs1_4== 6| qs1_4== 7| qs1_4== 8| qs1_4== 9| qs1_4== 10| qs1_4== 11| qs1_4== 12

label variable vol "individuals for voluntary social insurance"
label define vol 1 "Yes" 2 "No"
label values vol vol
note vol: vol =0 if qs1_4== 4| qs1_4== 5 , vol =1 if qs1_4== 1| qs1_4== 2| qs1_4== 3| qs1_4== 6| qs1_4== 7| qs1_4== 8| qs1_4== 9| qs1_4== 10| qs1_4== 11| qs1_4== 12



* =========================
* Step 2: Generate variables to drop observations with invalid ID
* =========================

* Generate tags of duplication in national ID
duplicates tag s1, gen(dplc_s1)


* =========================

* Generate variables for ID check by birthday correspondence
gen str bdayID=substr(s1,5,6)
gen double bday= qs1_2_1*10000 + qs1_2_2*100 + qs1_2_3
tostring bday, replace
gen str bday2=substr(bday,3,6)

gen 	bdayIDmt1=  bday2==bdayID

* =========================

* Generate variables for ID check by logical fallacies
gen str ID1=substr(s1,1,1)
gen str ID2=substr(s1,3,1)
gen str ID3=substr(s1,5,1)
gen str ID5=substr(s1,7,1)
gen str ID7=substr(s1,9,1)
gen str ID9=substr(s1,11,1)
gen str ID10=substr(s1,12,1)
gen ID1wr =ID1=="0" |ID1=="1"|ID1=="2"|ID1=="3"|ID1=="4"|ID1=="5" |ID1=="6"|ID1=="7"|ID1=="8"|ID1=="9"
gen ID2wr =ID2=="0" |ID2=="1"|ID2=="2"|ID2=="3"|ID2=="4"|ID2=="5" |ID2=="6"|ID2=="7"|ID2=="8"|ID2=="9"
gen ID3wr =ID3=="1" |ID3=="2"|ID3=="3"|ID3=="4"
gen ID5wr =ID5=="2" |ID5=="3"|ID5=="4"|ID5=="5" |ID5=="6" |ID5=="7" |ID5=="8" |ID5=="9" 
gen ID7wr =ID7=="4" |ID7=="5"|ID7=="6"|ID7=="7" |ID7=="8" |ID7=="9" 
gen ID10wr =ID10==""
gen IDwr = ID1wr==1|ID2wr==1|ID3wr==1|ID5wr==1|ID7wr==1|ID10wr==1

gen str ID1_2=substr(s1,1,4)
duplicates tag ID1_2, gen(IDcyr_dp)

gen str ID3_10=substr(s1,5,8)
duplicates tag ID3_10, gen(IDnum_dp)

gen str ID9_10=substr(s1,11,2)
duplicates tag ID9_10, gen(IDnum2_dp)

gen IDnumg=.
replace IDnumg=0 if ID9 =="1"|ID9 =="3"|ID9 =="5"|ID9 =="7"|ID9 =="9"
replace IDnumg=1 if ID9 =="0"|ID9 =="2"|ID9 =="4"|ID9 =="6"|ID9 =="8"


* =========================

* Generate variables for ID check by logical fallacies (spouse)
gen str ID1_s2=substr(s2,1,1)
gen str ID2_s2=substr(s2,3,1)
gen str ID3_s2=substr(s2,5,1)
gen str ID5_s2=substr(s2,7,1)
gen str ID7_s2=substr(s2,9,1)
gen str ID9_s2=substr(s2,11,1)
gen str ID10_s2=substr(s2,12,1)
gen ID1wr_s2 =ID1_s2=="0" |ID1_s2=="1"|ID1_s2=="2"|ID1_s2=="3"|ID1_s2=="4"|ID1_s2=="5" |ID1_s2=="6"|ID1_s2=="7"|ID1_s2=="8"|ID1_s2=="9"
gen ID2wr_s2 =ID2_s2=="0" |ID2_s2=="1"|ID2_s2=="2"|ID2_s2=="3"|ID2_s2=="4"|ID2_s2=="5" |ID2_s2=="6"|ID2_s2=="7"|ID2_s2=="8"|ID2_s2=="9"
gen ID3wr_s2 =ID3_s2=="1" |ID3_s2=="2"|ID3_s2=="3"|ID3_s2=="4"
gen ID5wr_s2 =ID5_s2=="2" |ID5_s2=="3"|ID5_s2=="4"|ID5_s2=="5" |ID5_s2=="6" |ID5_s2=="7" |ID5_s2=="8" |ID5_s2=="9" 
gen ID7wr_s2 =ID7_s2=="4" |ID7_s2=="5"|ID7_s2=="6"|ID7_s2=="7" |ID7_s2=="8" |ID7_s2=="9" 
gen ID10wr_s2 =ID10_s2==""
gen IDwr_s2 = ID1wr_s2==1|ID2wr_s2==1|ID3wr_s2==1|ID5wr_s2==1|ID7wr_s2==1|ID10wr_s2==1

gen str ID1_2_s2=substr(s2,1,4)
duplicates tag ID1_2_s2, gen(IDcyr_s2_dp)

gen str ID3_10_s2=substr(s2,5,8)
duplicates tag ID3_10_s2, gen(IDnum_s2_dp)

gen str ID9_10_s2=substr(s2,11,2)
duplicates tag ID9_10_s2, gen(IDnum2_s2_dp)

gen IDnumg_s2=.
replace IDnumg_s2=0 if ID9_s2 =="1"|ID9_s2 =="3"|ID9_s2 =="5"|ID9_s2 =="7"|ID9_s2 =="9"
replace IDnumg_s2=1 if ID9_s2 =="0"|ID9_s2 =="2"|ID9_s2 =="4"|ID9_s2 =="6"|ID9_s2 =="8"


* =========================

* Generate matching variable
gen 	genderID= .  if Хүйс=="NULL"
replace genderID= 1  if Хүйс=="1"
replace genderID= 2  if Хүйс=="0"
gen 	genderIDmt1= qs1_1==genderID   


gen ageID= Нас
gen agegap= ageID-qs1_2_4

gen ageIDmt1=   agegap>=0&agegap<=2


gen IDmt1 = IDwr==0 & bdayIDmt1==1 & ageIDmt1==1 & genderIDmt1==1

gen IDmt1_s2 = IDwr_s2==0 


* anonymize national ID, telephone number, birth day. 
replace s1="1" if s1!="0"
replace s2="1" if s2!="0"
replace s7=1 if s7>0 & s7<.
replace s8=1 if s8>0 & s8<.
replace qs1_2_3=99 if qs1_2_3!=0

replace s1="." if s1=="0"
replace s2="." if s2=="0"

drop bdayID bday bday2 ID1-ID10 ID1_2 ID3_10 ID9_10


* Count number of baghs

save `beforedrop', replace

keep pre_intervention s_Bag_ID
duplicates drop 
tab pre_intervention


* =========================
* Step 3: Drop invalid data 
* =========================

use `beforedrop', replace

count 
* 29,024

count if pre_intervention==. | overage==1 | vol!=1 
drop  if pre_intervention==. | overage==1 | vol!=1 

count 
 * 21,328

count if s4<3 | s4>7
* 1,835

drop  if s4<3 | s4>7
count if IDmt1!=1 
count if dplc_s1!=0

gen shoulddrop =1 if IDmt1!=1 | dplc_s1!=0
count if shoulddrop ==1
* 6,781

drop if shoulddrop ==1 
count 
* 12,712



use `beforedrop', replace
drop  if overage==1



* modify spouse' pension record (replace 0 to . if ID is unmatched) 

foreach x of varlist sum_pen_s2-p2017_12_s2{
replace `x' =. if IDmt1_s2!=1
}

* =========================
* =========================
* 4. Creating Variables for Data Analysis
* =========================
* ========================= 

* Replace "no answer/ unwilling to answer" to missing value(.a or .b)
replace qs1_6=.a if qs1_6==4
replace qs1_7=.b if qs1_7==6
replace qs1_8=.b if qs1_8==4
replace qs1_8=.a if qs1_8==5


* age group
gen age_group = .
replace age_group = 1 if age>=10 & age<20
replace age_group = 2 if age>=20 & age<30
replace age_group = 3 if age>=30 & age<40
replace age_group = 4 if age>=40 & age<50
replace age_group = 5 if age>=50 & age<60


* living aimag
gen liv_ai=qs1_3_1
label variable liv_ai "Aimag of residence"
label define liv_ai 21 "Dornod" 41 "Tuv" 43 "Selenge" 46 "Umnugovi" 64 "Bayankhongor" 65 "Arkhangai" 81 "Zavkhan" 85 "Uvs"
label values liv_ai liv_ai
note liv_ai:421 Dornod 341 Tuv 343 Selenge 346 Umnugovi 264 Bayankhongor 265 Arkhangai 181 Zavkhan 185 Uvs 
replace liv_ai=.c if liv_ai==11 | liv_ai==22 | liv_ai==67


* job
gen job =.
replace job =1 if qs1_4==1
replace job =2 if qs1_4==3
replace job =3 if qs1_4==2| qs1_4==6 | qs1_4==7 | qs1_4==8 | qs1_4==10
replace job =4 if qs1_4==9| qs1_4==12
replace job =5 if qs1_4==11
label variable job "Job"
label define job 1 "Herders" 2 "Self-employed" 3 "Other workers" 4 "Jobless" 5 "Students", replace
label values job job
note job: 1 Herders(1), 2 Self-employed(3), 3 Other workers (for voluntary pen.) (2)(6)(7)(8)(10), 4 Jobless (9)(12), 5 Students(11)

gen herder 			= job==1
gen self_employed 	= job==2
gen other_workers 	= job==3
gen jobless 		= job==4
gen students 		= job==5

label variable herder Herder 
label variable self_employed Self_employed 
label variable other_workers Other_workers 
label variable jobless Jobless 
label variable students Students 


* Friendship, Japan trust
gen ja_fr = 3-qs1_6
gen ja_tr = 5-qs1_7
label variable ja_fr "Friend Japan"
label variable ja_tr "Trust Japan"

gen ja_tr_dm =ja_tr==4 if ja_tr<.
label variable ja_tr_dm "Trust Japan dummy"

gen ja_fr_dm =ja_fr==2 if ja_fr<.
label variable ja_tr_dm "Friend Japan dummy"

* Treated at home
gen athome = qs1_8==3 if qs1_8!=.


* =========================
* data cleaning [qs2]
* =========================

* Replace "no answer/ unwilling to answer" to missing value(.a or .b)

replace qs2_1 =. if qs2_1==0
replace qs2_2 =. if qs2_2==0

* =========================
* Generate variables for analysis

gen lottery_b = qs2_1
gen lottery_s = qs2_2

gen qs2_type =.
* Most risk-lover
replace qs2_type =1 if qs2_1==1 & qs2_2==1
* Somewhat risk-lover
replace qs2_type =2 if qs2_1==1 & qs2_2==2
* Somewhat risk-adverse
replace qs2_type =3 if qs2_1==2 & qs2_2==1
* Most risk_adverse
replace qs2_type =4 if qs2_1==2 & qs2_2==2

label define qs2_1 1 "Risk-lover in the big lottery"   2 "Risk_adverse in the big lottery"
label define qs2_2 1 "Risk-lover in the small lottery" 2 "Risk_adverse in the small lottery"

label values qs2_1 qs2_1
label values qs2_2 qs2_2


label variable qs2_type "Risk preference"
label define qs2_type 1 "Most risk-lover" 2 "Somewhat risk-lover" 3 "Somewhat risk-adverse" 4 "Most risk_adverse"
label values qs2_type qs2_type

note qs2_type: "Most risk-lover" replace qs2_type =1 if qs2_1==1 & qs2_2==1, "Somewhat risk-lover" replace qs2_type =2 if qs2_1==1 & qs2_2==2, "Somewhat risk-adverse" replace qs2_type =3 if qs2_1==2 & qs2_2==1, "Most risk_adverse" replace qs2_type =4 if qs2_1==2 & qs2_2==2


gen qs2_type1 =.
* Risk-lover
replace qs2_type1 =1 if  qs2_1==1 & qs2_2==1
* Middle
replace qs2_type1 =2 if (qs2_1==1 & qs2_2==2)|(qs2_1==2 & qs2_2==1) 
* Risk_adverse
replace qs2_type1 =3 if  qs2_1==2 & qs2_2==2


* =========================
* data cleaning [qs3]
* =========================


* Replace "no answer/ unwilling to answer" to missing value(.a or .b)

replace qs3_1 =. if qs3_1==0
replace qs3_2 =. if qs3_2==0
replace qs3_3 =. if qs3_3==0
replace qs3_4 =. if qs3_4==0


* =========================
* Generate variables for analysis

gen 	qs3_type1 =.
* Patient
replace qs3_type1 =1 if qs3_1==2 & qs3_2==2 
* Somewhat impatient
replace qs3_type1 =2 if qs3_1==1 & qs3_2==2
* Most impatient
replace qs3_type1 =3 if qs3_1==1 & qs3_2==1

label variable qs3_type1 "Impatience in one month"
label define qs3_type1 1 "Patient" 2 "Somewhat impatient" 3 "Most impatient"
label values qs3_type1 qs3_type1
note qs3_type1: "Patient" replace qs3_type1 =1 if qs3_1==2 & qs3_2==2, "Somewhat impatient" replace qs3_type1 =2 if qs3_1==1 & qs3_2==2, "Most impatient" replace qs3_type1 =3 if qs3_1==1 & qs3_2==1


gen qs3_type2 =.
*Patient
replace qs3_type2 =1 if qs3_3==2 & qs3_4==2 
*Somewhat impatient
replace qs3_type2 =2 if qs3_3==1 & qs3_4==2 
*Most impatient
replace qs3_type2 =3 if qs3_3==1 & qs3_4==1

label variable qs3_type2 "Impatience in thirteen months"
label values qs3_type2 qs3_type1
note qs3_type2: "Patient" replace qs3_type2 =1 if qs3_3==2 & qs3_4==2, "Somewhat impatient" replace qs3_type2 =2 if qs3_3==1 & qs3_4==2, "Most impatient" replace qs3_type2 =3 if qs3_3==1 & qs3_4==1


gen qs3_type =. 
*(1) Consistent
replace qs3_type =1 if qs3_type1==1 & qs3_type2 ==1
replace qs3_type =1 if qs3_type1==2 & qs3_type2 ==2 
*(2) Hyperbolic
replace qs3_type =2 if qs3_type1==2  & qs3_type2 ==1 
replace qs3_type =2 if qs3_type1==3  & qs3_type2 ==1
replace qs3_type =2 if qs3_type1==3  & qs3_type2 ==2
*(3) Patient now and Impatient later
replace qs3_type =3 if qs3_type1==1 & qs3_type2 == 2
replace qs3_type =3 if qs3_type1==1 & qs3_type2 == 3
replace qs3_type =3 if qs3_type1==2 & qs3_type2 == 3
*(4) Consistent
replace qs3_type =1 if qs3_type1== 3 & qs3_type2 == 3


gen consistent=0
replace consistent=1 if qs3_type==1

gen hyperbolic=0
replace hyperbolic=1 if qs3_type==2

gen future_bias=0
replace future_bias=1 if qs3_type==3


label variable consistent Consistent
label variable hyperbolic "Present biased"
label variable future_bias "Future biased"


label variable qs3_type "Time preference"
label define qs3_type 1 "Consistent" 2 "Hyperbolic" 3 "Patient now and Impatient later" 
label values qs3_type qs3_type


* =========================
* data cleaning [qs4]
* =========================

* Replace "no answer/ unwilling to answer" to missing value(., .a or .b)

foreach x of var qs4_1-qs4_13 qs4_15_a-qs4_15_e {
replace `x'=. if `x'==0
}

foreach x of var qs4_6-qs4_11 qs4_4_1_1 qs4_4_2_1 {
replace `x'=.b if `x'==3
}

foreach x of var qs4_12-qs4_13 {
replace `x'=.a if `x'==6
}

foreach x of var qs4_15_a-qs4_15_e {
replace `x'=.a if `x'==8
}


* family members
gen num_fam_mem = qs4_2_1
gen num_child = qs4_3_1_2
gen num_child_fc = qs4_3_1_2
replace num_child_fc = 0 if qs4_3_1_1==2
replace num_child_fc = 4 if qs4_3_1_2>=4 & qs4_3_1_2<=19

gen child_dm = 2- qs4_3_1_1	

center num_fam_mem num_child, prefix(sd_)


* parents
gen fa_death=.
replace fa_death=1 if qs4_4_1_2<60
replace fa_death=0 if qs4_4_1_2>=60 & qs4_4_1_2<120

gen ma_death=.
replace ma_death=1 if qs4_4_2_2<55
replace ma_death=0 if qs4_4_2_2>=55 & qs4_4_2_2<120


* marry (based on regal relationship) 
gen marry =.
replace marry =1 if qs4_1==2 | qs4_1==4
replace marry =0 if qs4_1==1 | qs4_1==3 | qs4_1==5 | qs4_1==6
label variable marry "Marriage status"
label define yesno 1 "Yes" 0 "No"
label values marry yesno
note marry: =1 if qs4_1==2 | qs4_1==4, 0 if qs4_1==1 | qs4_1==3 | qs4_1==5 | qs4_1==6


* education 
gen edu =.
replace edu =1 if qs4_5 ==1 | qs4_5 ==2
replace edu =2 if qs4_5 ==3
replace edu =3 if qs4_5 ==4
replace edu =4 if qs4_5 ==5 | qs4_5 ==6 | qs4_5 ==7
replace edu =5 if qs4_5 ==8 | qs4_5 ==9 | qs4_5 ==10


label variable edu "Educational background"
label define edu 1 "Elementary school or less" 2 "Junior high school" 3 "High school" 4 "Professional education" 5 "Higher education"
label values edu edu
note edu: =1 if qs4_5 ==1 | qs4_5 ==2, =2 if qs4_5 ==3, =3 if qs4_5 ==4, =4 if qs4_5 ==5 | qs4_5 ==6 | qs4_5 ==7, =5 if qs4_5 ==8 | qs4_5 ==9 | qs4_5 ==10

* bank_information
gen bank_account= 	2 - qs4_6
gen saving=			2 - qs4_7
gen loan= 			2 - qs4_8
gen borrow= 		2 - qs4_9
gen herder_loan= 	2 - qs4_10
gen use_mob_bank= 	2 - qs4_11


* income 
gen income= qs4_12
label variable income "Income"
label values income income

gen 	income_dm = 1 if income>0 & income <.
replace income_dm = 0 if income_dm>=.


* stock
gen ger 	= qs4_14_1 
gen house 	= qs4_14_2 
gen car 	= qs4_14_3 
gen bike 	= qs4_14_4 
gen TV 		= qs4_14_5

local stock "ger house car bike tv"


* animal
gen cows 	=qs4_15_a 
gen horses 	=qs4_15_b 
gen camels 	=qs4_15_c 
gen sheep 	=qs4_15_d 
gen goats 	=qs4_15_e


local animal "cows horses camels sheep goats"


* =========================
* data cleaning [qs5]
* =========================

					   
* Replace "no answer/ unwilling to answer" to missing value(.a or .b)

foreach x of var qs5_1-qs5_3 qs5_6-qs5_8 qs5_10-qs5_11  qs5_13 qs5_15-qs5_21{
replace `x'=. if `x'==0
}

foreach x of var qs5_2 qs5_6-qs5_8 qs5_16-qs5_18 {
replace `x'=.b if `x'==3
}

foreach x of var qs5_2 qs5_6-qs5_8 qs5_16-qs5_18  {
replace `x'=.a if `x'==4
}

foreach x of var qs5_1 qs5_3 qs5_15 {
replace `x'=.b if `x'==4
}

foreach x of var qs5_1 qs5_3 qs5_15 qs5_19-qs5_21 {
replace `x'=.a if `x'==5
}

foreach x of var qs5_10  {
replace `x'=.b if `x'==5
}

foreach x of var qs5_11 {
replace `x'=.b if `x'==10
}



* pension(admin)
gen p2017_q1 =p2017_1 +p2017_2 +p2017_3
gen p2017_q2 =p2017_4 +p2017_5 +p2017_6
gen p2017_q3 =p2017_7 +p2017_8 +p2017_9
gen p2017_q4 =p2017_10+p2017_11+p2017_12

gen p2017_q2_4 =p2017_q2 + p2017_q3 + p2017_q4
gen p2017_q2_3 =p2017_q2 + p2017_q3
gen p2017_q3_4 =p2017_q3 + p2017_q4


* pension(survey)
gen pen =.
replace pen = 3- qs5_1

label variable pen "Social insurance participation"
label define pen 0 "Not participate" 1 "Participate but not pay" 2 "Participate and pay", replace
label values pen pen

gen pen_d =.
replace pen_d = 1 if qs5_1==1
replace pen_d = 0 if qs5_1==2 |qs5_1==3
label variable pen_d "Social insurance participation (yes or no)"

gen pen_d_weak =.
replace pen_d_weak = 1 if qs5_1==1 | qs5_1==2
replace pen_d_weak = 0 if qs5_1==3
label variable pen_d_weak "Social insurance participation (yes or no)"

gen pay =.
replace pay =0 if qs5_3==3
replace pay =1 if qs5_3==2
replace pay =2 if qs5_3==1
label variable pay "Social insurance payment"

gen pen_info= qs5_5_1 + qs5_5_2+qs5_5_3+qs5_5_4+qs5_5_5+qs5_5_6+qs5_5_7+qs5_5_9
gen     pen_info_dm=0 if pen_info>=0 & pen_info<=1
replace pen_info_dm=1 if pen_info>=2 & pen_info<=9

label variable qs5_5_1 "Peninfo: Radio" 
label variable qs5_5_2 "Peninfo: TV" 
label variable qs5_5_3 "Peninfo: Internet" 
label variable qs5_5_4 "Peninfo: Magagines or newspaper" 
label variable qs5_5_5 "Peninfo: Explanatory material" 
label variable qs5_5_6 "Peninfo: Inspectors" 
label variable qs5_5_7 "Peninfo: Word-of-mouth" 
label variable qs5_5_8 "Peninfo: No information" 

gen health_ins =.
replace health_ins = 3- qs5_15
label variable health_ins "Health insurance payment"


gen life_expect =.
replace life_expect = qs5_11
label variable life_expect "Expected longevity"

gen 	life_expect_dm = 1 if (life_expect<.&life_expect>=4&female==0)|(life_expect<.&life_expect>=6&female==1)
replace life_expect_dm = 0 if (life_expect< 4&female==0)|(life_expect<6 &female==1)
label variable life_expect_dm "Expected longevity dummy"

gen parents_pen = 2-qs5_6

* visit soum center
foreach x in a b c d {
tempvar visit_`x'1 visit_`x'2 visit_`x'3 visit_`x'4 visit_`x'5 visit_`x'6 avg_`x'1 avg_`x'2 avg_`x'3

gen `visit_`x'1' = regexs(1) if regexm(qs5_12`x'_1, "([0-9]*[0-9]*[0-9])[-]([0-9]*[0-9]*[0-9])")
gen `visit_`x'2' = regexs(2) if regexm(qs5_12`x'_1, "([0-9]*[0-9]*[0-9])[-]([0-9]*[0-9]*[0-9])")
gen `visit_`x'3' = regexs(1) if regexm(qs5_12`x'_1, "([0-9]*[0-9]*[0-9])[.]([0-9]*[0-9]*[0-9])")
gen `visit_`x'4' = regexs(2) if regexm(qs5_12`x'_1, "([0-9]*[0-9]*[0-9])[.]([0-9]*[0-9]*[0-9])")
gen `visit_`x'5' = regexs(1) if regexm(qs5_12`x'_1, "([0-9]*[0-9]*[0-9])[,]([0-9]*[0-9]*[0-9])")
gen `visit_`x'6' = regexs(2) if regexm(qs5_12`x'_1, "([0-9]*[0-9]*[0-9])[,]([0-9]*[0-9]*[0-9])")

destring `visit_`x'1' `visit_`x'2' `visit_`x'3' `visit_`x'4' `visit_`x'5' `visit_`x'6', replace
egen `avg_`x'1' = rowmean(`visit_`x'1' `visit_`x'2')
egen `avg_`x'2' = rowmean(`visit_`x'3' `visit_`x'4')
egen `avg_`x'3' = rowmean(`visit_`x'5' `visit_`x'6')

gen 	 visit_`x' = qs5_12`x'_1
destring visit_`x', replace force
replace  visit_`x' = `avg_`x'1' if strpos(qs5_12`x'_1, "-")
replace  visit_`x' = `avg_`x'2' if strpos(qs5_12`x'_1, ".")
replace  visit_`x' = `avg_`x'3' if strpos(qs5_12`x'_1, ".")
replace  visit_`x' = .  if visit_`x' >31
}

rename visit_a visit_spr
rename visit_b visit_smr
rename visit_c visit_aut
rename visit_d visit_wtr

gen visit_all= visit_spr +visit_smr +visit_aut +visit_wtr
** automatically omit observations if some of visit_... is missing.

center visit_all, prefix(sd_)
gen visit_few_dm = sd_visit_all<0 if sd_visit_all!=.
label variable visit_few_dm "Few visit dummy"

* hours to soum center
foreach x in a b c d {
tempvar hours_`x'1 hours_`x'2 hours_`x'3 hours_`x'4 hours_`x'5 hours_`x'6 avg_`x'1 avg_`x'2 avg_`x'3

gen `hours_`x'1' = regexs(1) if regexm(qs5_12`x'_3_1, "([0-9]*[0-9]*[0-9])[-]([0-9]*[0-9]*[0-9])")
gen `hours_`x'2' = regexs(2) if regexm(qs5_12`x'_3_1, "([0-9]*[0-9]*[0-9])[-]([0-9]*[0-9]*[0-9])")
gen `hours_`x'3' = regexs(1) if regexm(qs5_12`x'_3_1, "([0-9]*[0-9]*[0-9])[.]([0-9]*[0-9]*[0-9])")
gen `hours_`x'4' = regexs(2) if regexm(qs5_12`x'_3_1, "([0-9]*[0-9]*[0-9])[.]([0-9]*[0-9]*[0-9])")
gen `hours_`x'5' = regexs(1) if regexm(qs5_12`x'_3_1, "([0-9]*[0-9]*[0-9])[,]([0-9]*[0-9]*[0-9])")
gen `hours_`x'6' = regexs(2) if regexm(qs5_12`x'_3_1, "([0-9]*[0-9]*[0-9])[,]([0-9]*[0-9]*[0-9])")

destring `hours_`x'1' `hours_`x'2' `hours_`x'3' `hours_`x'4' `hours_`x'5' `hours_`x'6', replace
egen `avg_`x'1' = rowmean(`hours_`x'1' `hours_`x'2')
egen `avg_`x'2' = rowmean(`hours_`x'3' `hours_`x'4')
egen `avg_`x'3' = rowmean(`hours_`x'5' `hours_`x'6')
gen hours_`x' = qs5_12`x'_3_1
destring hours_`x', replace force
replace hours_`x' = `avg_`x'1' if strpos(qs5_12`x'_3_1, "-")
replace hours_`x' = `avg_`x'2' if strpos(qs5_12`x'_3_1, ".")
replace hours_`x' = `avg_`x'3' if strpos(qs5_12`x'_3_1, ".")

tempvar minutes_`x'1 minutes_`x'2 minutes_`x'3 minutes_`x'4 minutes_`x'5 minutes_`x'6 avg_`x'1 avg_`x'2 avg_`x'3

gen `minutes_`x'1' = regexs(1) if regexm(qs5_12`x'_3_2, "([0-9]*[0-9]*[0-9])[-]([0-9]*[0-9]*[0-9])")
gen `minutes_`x'2' = regexs(2) if regexm(qs5_12`x'_3_2, "([0-9]*[0-9]*[0-9])[-]([0-9]*[0-9]*[0-9])")
gen `minutes_`x'3' = regexs(1) if regexm(qs5_12`x'_3_2, "([0-9]*[0-9]*[0-9])[.]([0-9]*[0-9]*[0-9])")
gen `minutes_`x'4' = regexs(2) if regexm(qs5_12`x'_3_2, "([0-9]*[0-9]*[0-9])[.]([0-9]*[0-9]*[0-9])")
gen `minutes_`x'5' = regexs(1) if regexm(qs5_12`x'_3_2, "([0-9]*[0-9]*[0-9])[,]([0-9]*[0-9]*[0-9])")
gen `minutes_`x'6' = regexs(2) if regexm(qs5_12`x'_3_2, "([0-9]*[0-9]*[0-9])[,]([0-9]*[0-9]*[0-9])")

destring `minutes_`x'1' `minutes_`x'2' `minutes_`x'3' `minutes_`x'4' `minutes_`x'5' `minutes_`x'6', replace
egen `avg_`x'1' = rowmean(`minutes_`x'1' `minutes_`x'2')
egen `avg_`x'2' = rowmean(`minutes_`x'3' `minutes_`x'4')
egen `avg_`x'3' = rowmean(`minutes_`x'5' `minutes_`x'6')
gen minutes_`x' = qs5_12`x'_3_2
destring minutes_`x', replace force
replace minutes_`x' = `avg_`x'1' if strpos(qs5_12`x'_3_2, "-")
replace minutes_`x' = `avg_`x'2' if strpos(qs5_12`x'_3_2, ".")
replace minutes_`x' = `avg_`x'3' if strpos(qs5_12`x'_3_2, ".")
}

* foreach x in a b c d {
* replace hours_`x'=0 if hours_`x'==.
* replace minutes_`x'=0 if minutes_`x'==.
* }

gen transportation_spr= qs5_12a_2_1
gen transportation_smr= qs5_12b_2_1
gen transportation_aut= qs5_12c_2_1
gen transportation_wtr= qs5_12d_2_1

label define transportation 1 "By car or bike" 2 "By horse" 3"On foot"
label values transportation_spr transportation
label values transportation_smr transportation
label values transportation_aut transportation
label values transportation_wtr transportation

gen time_spr= hours_a*60+minutes_a
gen time_smr= hours_b*60+minutes_b
gen time_aut= hours_c*60+minutes_c
gen time_wtr= hours_d*60+minutes_d

gen time_mean= (time_spr+time_smr+time_aut+time_wtr)/4
gen time_long_dm=0 if time_mean<109 
replace time_long_dm=1 if time_mean>=109 & time_mean<.

replace time_spr=. if time_spr==0
replace time_smr=. if time_smr==0
replace time_aut=. if time_aut==0
replace time_wtr=. if time_wtr==0


* bagh meeting
gen bagh_mtg =qs5_13


* bequest motief
replace qs5_14=. if qs5_14==0
replace qs5_14=. if qs5_14==6

gen 	bequest =. 
replace bequest =1 if qs5_14==1 
replace bequest =2 if qs5_14==2 | qs5_14==3 | qs5_14==4 
replace bequest =3 if qs5_14==5 

gen 	bequest2 =. 
replace bequest2 =1 if qs5_14==1 | qs5_14==2 | qs5_14==3 | qs5_14==4 
replace bequest2 =2 if qs5_14==5 

label variable bequest "bequest motieve"
label define bequest 1 "No bequest motieve" 2 "Conditional bequest motive" 3 "Strong bequest motieve"
label define bequest2 1 "No or conditional bequest motive" 2 "Strong bequest motieve"

label values bequest bequest
label values bequest2 bequest2

notes bequest: =1 if qs5_14==1, 2 if qs5_14==2 | qs5_14==3 | qs5_14==4, =3 if qs5_14==5  
notes bequest2: =1 if qs5_14==1| qs5_14==2 | qs5_14==3 | qs5_14==4, =2 if qs5_14==5  


* health information
gen sick 			=2-qs5_16
gen hospital 		=2-qs5_17
gen hospitalization =2-qs5_18
gen accidents 		=4-qs5_19
gen accidents_dm	=accidents>=0&accidents<=2


* tobacco and alcohol
gen tobacco = qs5_20
label variable tobacco "tobacco"

gen alcohol = qs5_21
label variable alcohol "alcohol"


* dzud 
gen dzud =. 
replace dzud =0 if qs5_22_4==1 
replace dzud =1 if qs5_22_1==1 | qs5_22_2==1| qs5_22_3==1
label variable dzud "dzud experience"
label values dzud yesno


* livestock insurance
gen livs_ins =.
replace livs_ins =0 if qs5_23_9==0
replace livs_ins =1 if qs5_23_1==1 | qs5_23_1==2 | qs5_23_1==3 | qs5_23_1==4 | qs5_23_1==5 | qs5_23_1==6 | qs5_23_1==7 | qs5_23_1==8 
label variable livs_ins "Livestock insurance experience"
label values livs_ins yesno

gen pen_expect = 4- qs5_10
label variable pen_expect "Pension expectation"

gen pen_expect_dm = pen_expect==3
replace pen_expect_dm=. if pen_expect>=.
label variable pen_expect "Pension expectation dummy"



				   
* =========================
* Generate variables for regression
* =========================

gen strata = aimag*10+d_h_rate
tab strata, gen(strata_)


* Intervention variables (determined before the intervention)
gen 	pre_treated=0 if pre_intervention==5
replace pre_treated=1 if pre_intervention>=6 & pre_intervention<=8

gen 	pre_control = pre_intervention == 5
replace pre_control = . if pre_intervention == .

gen 	pre_disability = pre_intervention == 6
replace pre_disability = . if pre_intervention == .

gen 	pre_trust = pre_intervention == 7
replace  pre_trust =. if pre_intervention == .

gen 	pre_mobile = pre_intervention == 8
replace pre_mobile =. if pre_intervention == .


label  variable pre_treated 	"Treated" 
label  variable pre_control 	"Control-treatment" 
label  variable pre_disability 	"Disability-treatment" 
label  variable pre_trust 		"Trust-treatment" 
label  variable pre_mobile 		"Mobile-treatment" 


* Intervention variables (actually distributed at bagh meetings)
*gen 	post_treated = intervention >= 6 & intervention <= 8
*replace post_treated = . if intervention == . 
*gen 	post_control = intervention == 5
*replace post_control = . if intervention == .
*gen 	post_disability = intervention == 6
*replace post_disability = . if intervention == .
*gen 	post_trust = intervention == 7
*replace post_trust = . if intervention == .
*gen 	post_mobile = intervention == 8
*replace post_mobile = . if intervention == .

* Intervention variables (if the actual ones are matched with the plan)
*gen mtch_treated = pre_treated==1  & post_treated==1 
*replace mtch_treated = . if pre_treated==.  | post_treated==.
*gen mtch_control = pre_control==1  & post_control==1
*replace mtch_control = . if pre_control==.  | post_control==.
*gen mtch_disability = pre_disability==1  & post_disability==1
*replace mtch_disability = . if pre_disability==.  | post_disability==.
*gen mtch_trust = pre_trust==1  & post_trust==1
*replace mtch_trust = . if pre_trust==.  | post_trust==.
*gen mtch_mobile = pre_mobile==1  & post_mobile==1
*replace mtch_mobile = . if pre_mobile==.  | post_mobile==.
*gen mtch_intervention = intervention if pre_intervention==intervention
*replace mtch_intervention = . if pre_intervention==.  | intervention==.
*gen mtch_intervention_dm = mtch_intervention>=5 &  mtch_intervention<=8
*bysort s_Bag_ID : egen mtch_intervention_rate =mean(mtch_intervention_dm)


* Intervention variables (pre) (Control vs XXX )
gen pre_treated_ctr = 	 0 if pre_control==1
gen pre_disability_ctr = 0 if pre_control==1
gen pre_trust_ctr = 	 0 if pre_control==1
gen pre_mobile_ctr = 	 0 if pre_control==1

replace pre_treated_ctr = 	 1 if pre_treated==1
replace pre_disability_ctr = 1 if pre_disability==1
replace pre_trust_ctr = 	 1 if pre_trust==1
replace pre_mobile_ctr = 	 1 if pre_mobile==1


* Intervention variables (mtch) (Control vs XXX )
*gen mtch_treated_ctr 	= 0 if mtch_control==1
*gen mtch_disability_ctr = 0 if mtch_control==1
*gen mtch_trust_ctr 		= 0 if mtch_control==1
*gen mtch_mobile_ctr 	= 0 if mtch_control==1

*replace mtch_treated_ctr 	= 1 if mtch_treated==1
*replace mtch_disability_ctr = 1 if mtch_disability==1
*replace mtch_trust_ctr 		= 1 if mtch_trust==1
*replace mtch_mobile_ctr 	= 1 if mtch_mobile==1


* =========================
* Generate premium payment variables after interventions

foreach n of numlist 0/9{
gen p_`n'mafter = .
foreach month of numlist 4/12{
replace p_`n'mafter = p2017_`month' if s4 + `n' == `month'
}
}

gen 	p_3msumafter = .
replace p_3msumafter = p2017_5  + p2017_6  + p2017_7  if s4 == 4
replace p_3msumafter = p2017_6  + p2017_7  + p2017_8  if s4 == 5
replace p_3msumafter = p2017_7  + p2017_8  + p2017_9  if s4 == 6
replace p_3msumafter = p2017_8  + p2017_9  + p2017_10 if s4 == 7
replace p_3msumafter = p2017_9  + p2017_10 + p2017_11 if s4 == 8
replace p_3msumafter = p2017_10 + p2017_11 + p2017_12 if s4 == 9

gen 	p_sumafter = .
replace p_sumafter = p2017_12 + p2017_11 + p2017_10 + p2017_9 + p2017_8 + p2017_7 + p2017_6 + p2017_5 if s4 == 4
replace p_sumafter = p2017_12 + p2017_11 + p2017_10 + p2017_9 + p2017_8 + p2017_7 + p2017_6 if s4 == 5
replace p_sumafter = p2017_12 + p2017_11 + p2017_10 + p2017_9 + p2017_8 + p2017_7 if s4 == 6
replace p_sumafter = p2017_12 + p2017_11 + p2017_10 + p2017_9 + p2017_8 if s4 == 7
replace p_sumafter = p2017_12 + p2017_11 + p2017_10 + p2017_9 if s4 == 8
replace p_sumafter = p2017_12 + p2017_11 + p2017_10 if s4 == 9

gen p_3msumafter_dm =  p_3msumafter>0&p_3msumafter<.
replace p_3msumafter_dm=. if p_3msumafter==.
gen p_sumafter_dm =  p_sumafter>0&p_sumafter<.
replace p_sumafter_dm=. if p_sumafter==.


* =========================
* Generate premium payment variables before interventions
foreach n of numlist 1/9{
gen p_`n'mbefore = .
foreach month of numlist 1/9{
replace p_`n'mbefore = p2017_`month' if s4 - `n' == `month'
}
}


gen p_3msumbefore = .
replace p_3msumbefore = p2017_1 + p2017_2 + p2017_3 if s4 == 4
replace p_3msumbefore = p2017_2 + p2017_3 + p2017_4 if s4 == 5
replace p_3msumbefore = p2017_3 + p2017_4 + p2017_5 if s4 == 6
replace p_3msumbefore = p2017_4 + p2017_5 + p2017_6 if s4 == 7
replace p_3msumbefore = p2017_5 + p2017_6 + p2017_7 if s4 == 8
replace p_3msumbefore = p2017_6 + p2017_7 + p2017_8 if s4 == 9


gen p_sumbefore = 0
replace p_sumbefore = 0  if s4 == 1
replace p_sumbefore = p2017_1  if s4 == 2
replace p_sumbefore = p2017_1 + p2017_2  if s4 == 3
replace p_sumbefore = p2017_1 + p2017_2 + p2017_3 if s4 == 4
replace p_sumbefore = p2017_1 + p2017_2 + p2017_3 + p2017_4 if s4 == 5
replace p_sumbefore = p2017_1 + p2017_2 + p2017_3 + p2017_4 + p2017_5 if s4 == 6
replace p_sumbefore = p2017_1 + p2017_2 + p2017_3 + p2017_4 + p2017_5 + p2017_6 if s4 == 7
replace p_sumbefore = p2017_1 + p2017_2 + p2017_3 + p2017_4 + p2017_5 + p2017_6 + p2017_7 if s4 == 8
replace p_sumbefore = p2017_1 + p2017_2 + p2017_3 + p2017_4 + p2017_5 + p2017_6 + p2017_7 + p2017_8 if s4 == 9
replace p_sumbefore = p2017_1 + p2017_2 + p2017_3 + p2017_4 + p2017_5 + p2017_6 + p2017_7 + p2017_8 + p2017_9 if s4 == 10
replace p_sumbefore = p2017_1 + p2017_2 + p2017_3 + p2017_4 + p2017_5 + p2017_6 + p2017_7 + p2017_8 + p2017_9 + p2017_10 if s4 == 11
replace p_sumbefore = p2017_1 + p2017_2 + p2017_3 + p2017_4 + p2017_5 + p2017_6 + p2017_7 + p2017_8 + p2017_9 + p2017_10 + p2017_11 if s4 == 12
replace p_sumbefore = 0 if s4 == .


* =========================
* Generate dummy variables on participation before interventions
gen p_3msumbefore_dm =  p_3msumbefore>0
gen p_sumbefore_dm   =  p_sumbefore>0


replace p_3msumbefore_dm = . if p_3msumbefore==.
replace p_sumbefore_dm   = . if p_sumbefore==.


* =========================
* Generate dummy variable whether participated or not in each year
forvalues x=2006/2017 {
gen p`x'_dm=p`x'>0 & p`x'<. 
}


* =========================
* Generate dummy variables for selecting groups
** newcustomer17(==1) choose individuals who had not paid any contributions since 2006 when the intervention was conducted. 
gen 	p2017before = .
replace p2017before = p2017_1+p2017_2+p2017_3

gen newcustomer   = p2017before==0&p2016==0&p2015==0&p2014==0&p2013==0&p2012==0&p2011==0&p2010==0&p2009==0&p2008==0&p2007==0&p2006==0
gen newcustomer17 = p2016==0&p2015==0&p2014==0&p2013==0&p2012==0&p2011==0&p2010==0&p2009==0&p2008==0&p2007==0&p2006==0&p_sumbefore_dm==0
gen newcustomer16 = p2016<.&p2016>=1&p2015==0&p2014==0&p2013==0&p2012==0&p2011==0&p2010==0&p2009==0&p2008==0&p2007==0&p2006==0

** suspended(==1) chooses individuals who had not paid any contributions since 2016 though they had participated in the social insurance
gen suspended = p_sumbefore_dm==0&p2016==0&newcustomer17==0

** p_1ybefore(==1) limits individuals who had paid any contributions since 2016 to the intervention 
gen p_1ybefore = p_sumbefore_dm==1|(p2016>0&p2016<.)
label values p_1ybefore yesno

** p_allbefore(==1) limits individuals who had paid any contributions since 2006 to the intervention (i.e. adverse of newcustomer17)
gen p_allbefore = p_sumbefore_dm==1|p2016>0|p2015>0|p2014>0|p2013>0|p2012>0|p2011>0|p2010>0|p2009>0|p2008>0|p2007>0|p2006>0

label values p_allbefore yesno



gen  	p2016_pay4=. if p2016==0
replace p2016_pay4=1 if p2016>=1  & p2016<=3
replace p2016_pay4=2 if p2016>=4  & p2016<=6
replace p2016_pay4=3 if p2016>=7  & p2016<=9
replace p2016_pay4=4 if p2016>=10 & p2016<=12
label variable p2016_pay4 "Payment in 2016"
label define p2016_pay4 1 "(1) 1-3 times" 2 "(3) 4-6 times" 3 "(3) 7-9 times" 4 "(4) 10-12 times", replace
label values p2016_pay4 p2016_pay4


gen  	p2016_pay2=. if p2016==0
replace p2016_pay2=1 if p2016>=1  & p2016<=6
replace p2016_pay2=2 if p2016>=7  & p2016<=12
label variable p2016_pay2 "Payment in 2016"
label define p2016_pay2 1 "(1) 1-6 times" 2 "(2) 7-12 times" , replace
label values p2016_pay2 p2016_pay2


gen  	p2016_pay12=. if p2016==0
replace p2016_pay12=1 if p2016>=1  & p2016<=11
replace p2016_pay12=2 if p2016==12
label variable p2016_pay12 "Payment in 2016"
label define p2016_pay12 1 "(1) 1-11 times" 2 "(2) 12 times" , replace
label values p2016_pay12 p2016_pay12



* =========================
* Generate number of months from intervention to participation (including end point)
gen duration = 13 - s4 

foreach x of numlist 12/3 { 
replace duration = `x'-s4 if p2017_`x'==1
}

replace duration=. if s4<4 | s4>9
replace duration=. if newcustomer17==0



* Identify bad_soums (short distributions)
egen s_Soum_ID_total = total(cons), by(s_Soum_ID)
gen bad_soum = 1 if s_Soum_ID_total<=25


gen pen_mt=(p_1ybefore==0&pen_d==0)|(p_1ybefore==1&pen_d==1)
gen pen_mt2=(p_allbefore==0&pen_d_weak==0)|(p_allbefore==1&pen_d_weak==1)
gen pen_mt3=(p_1ybefore==0&pen_d==0)|(p_allbefore==1&pen_d_weak==1)


gen 	admin_class =0
replace admin_class =1 if admin_order>0 	& admin_order<=10000
replace admin_class =2 if admin_order>10000 & admin_order<=20000
replace admin_class =3 if admin_order>20000 & admin_order<=30000
replace admin_class =4 if admin_order>30000 & admin_order<=40000


gen accident = qs5_19
gen drink = qs5_21
gen sickness = qs5_17

center accident drink sickness, prefix(sd_)
gen riskscore = sd_accident + sd_drink + sd_sickness
label variable riskscore "Risk score"

gen pre_disability_riskscore = pre_disability*riskscore
gen pre_trust_riskscore  = pre_trust*riskscore
gen pre_mobile_riskscore = pre_mobile*riskscore


gen 	highereduc2 = qs4_5>=4 if qs4_5<.
label variable 	highereduc2 "Education dummy" 

gen 	highereduc3 = qs4_5>=4
replace highereduc3 = . if qs4_5>=.


save intermediate/confidential/data_for_analysis.dta, replace

